A Word Matching Algorithm in Handwritten Arabic Recognition Using Multiple-Sequence Weighted Edit Distances
نویسندگان
چکیده
No satisfactory solutions are yet available for the offline recognition of handwritten cursive words, including the words of Arabic text. Word matching algorithms can greatly improve the OCR output when recognizing words of known and limited vocabulary. This paper describes the word matching algorithm used in the JU-OCR2 optical character recognition system of handwritten Arabic words. This system achieves state-of-the-art accuracy through multiple techniques including an efficient word matching algorithm. This algorithm reduces the average sequence error for the IfN/ENIT database of handwritten Arabic words from 32.3% to an average word error of just 5.0%. This algorithm is a weighted version of the edit distance algorithm. The weighted version has a 5.0% advantage over the plain edit distance algorithm. This algorithm selects the best match utilizing a set of multiple probable sequences from the sequence transcription stage. Using multiple sequences, instead of one, reduces the average error by 27.0% over the weighted edit distance algorithm. Compared with an algorithm used in a leading system, this algorithm offers 6.7% lower average word error for the main two test sets.
منابع مشابه
Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملConnected Component Based Word Spotting on Persian Handwritten image documents
Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...
متن کاملCombination of multiple classifiers for handwritten word recognition
Because of large shape variations in human handwriting, recognition accuracy of cursive handwritten word is hardly satisfying using a single classifier. In this paper we introduce a framework to combine results of multiple classifiers and present an intuitive run-time weighted opinion pool (RWOP) combination approach for recognizing cursive handwritten words with a large size vocabulary. The in...
متن کاملیک روش دو مرحلهای برای بازشناسی کلمات دستنوشته فارسی به کمک بلوکبندی تطبیقی گرادیان تصویر
This paper presented a two step method for offline handwritten Farsi word recognition. In first step, in order to improve the recognition accuracy and speed, an algorithm proposed for initial eliminating lexicon entries unlikely to match the input image. For lexicon reduction, the words of lexicon are clustered using ISOCLUS and Hierarchal clustering algorithm. Clustering is based on the featur...
متن کاملHolistic Farsi handwritten word recognition using gradient features
In this paper we address the issue of recognizing Farsi handwritten words. Two types of gradient features are extracted from a sliding vertical stripe which sweeps across a word image. These are directional and intensity gradient features. The feature vector extracted from each stripe is then coded using the Self Organizing Map (SOM). In this method each word is modeled using the discrete Hidde...
متن کامل